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This paper proposes a relational constraint driven technique that synthesizes test cases automatically 
for web applications. Using a static analysis, servlets can be modeled as relational transducers, 
which manipulate backend databases. We present a synthesis algorithm that generates a sequence 
of HTTP requests for simulating a user session. The algorithm relies on backward symbolic image 
computation for reaching a certain database state, given a code coverage objective. With a slight 
adaptation, the technique can be used for discovering workflow attacks on web applications. 

1 Introduction 

Modem web applications usually rely on backend database systems for storing important system infor- 
mation or supporting business decisions. The complexity of database queries, however, often complicates 
the task of thoroughly testing a web application. To manually design test cases involves labor intensive 
initialization of database systems, even with the help of unit testing tools such as SQLUnit ||25]| and 
DBUnit 0. It is desirable to automatically synthesize test cases for web applications. 

There has been a strong interest recently in testing database driven applications and database man- 
agement systems (see e.g., [ 12l !4!, 1201). Many of them are query aware, i.e., given a SQL query, an 
initial database (DB) instance is generated to make that query satisfiable. The DB instance is fed to 
the target web application as input, so that a certain code coverage goal is achieved. The problem we 
are trying to tackle is one step further - it is a synthesis problem: given a certain database state (or a 
relational constraint), a call sequence of web servlets is synthesized to reach the given DB state. This 
is motivated by the special architecture of web applications. Unlike typical desktop software systems, 
the atomic components of a web application, i.e., web servlets, are accessible to end users. In addition, 
cookies and session variables are frequently used for session maintenance and user tracing. Thus web 
application testing is usually session oriented (see, e.g., 1221 [111 ). Unlike unit testing general database 
driven applications, an intermediate database state has to be resulted from a call sequence and cannot be 
initialized at will. We are interested in synthesizing such a call sequence. The technique could also be 
leveraged to detect workflow attacks ElTl, where an "unexpected" call sequence can cause harm to the 
business logic of a web application (e.g., shipping a product without charging credit card). 

We propose a white-box analysis, which consists of the following steps: (1) interface extraction: each 
web servlet is modeled as a collection of path transducers. A path transducer is essentially a relational 
transducer [2 | that corresponds to a single execution path of the web servlet. A transducer is represented 
as a pair of pre/post conditions, built upon relational constraints. Path transducers are extracted using 
light-weight symbolic execution technique. (2) coverage goal generation: to achieve a certain coverage 
goal, symbolic execution is used to generate relational constraints (expressed using first order relational 
logic HU). (3) call sequence synthesis: a heuristic algorithm is used to determine a call sequence that 
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could lead to the required intermediate database state. Symbolic constraint solving is currently used for 
performing backward/forward symbolic image computation of transducers. Best effort constraint solving 
(restricted to finite model and finite scope), based on Alloy Analyzer 11711 . is used to generate parameters 
of each HTTP request in the call sequence. 

Although there does not exist an implementation for the proposed technique, we plan to illustrate its 
effectiveness and feasibility using a case study. §2 presents a motivating example. §3 introduces the path 
transducer model. §4 discusses symbolic image computation of relational algebra. The call sequence 
algorithm is presented in §5. §6 adapts the algorithm for discovering workflow attacks. §7 discusses 
related work and concludes. 

2 Motivating Example 

This section introduces SimpleScarf , a case study example used throughout the paper. SimpleScarf is 
adapted from the Scarf conference management system |26|. It is comprised of five servlets for managing 
the membership of a conference. Each servlet is implemented as a PHP file that accepts HTML requests 
and generates HTML responses. 

These servlets are briefly described as follows. Later we will present the formal relational trans- 
ducer model for each of them. (1) Showsessions .php displays the list of paper sessions that are as- 
sociated with the current user. (2) Insertsession.php adds a new paper session to the system. (3) 
Addmember . php inserts a new member into a paper session. (4) Generaloptions .php creates a new 
user of the system. (5) Login . php authenticates the log-in process. Once a user successfully logs in, the 
servlet sets a session variable called "user" for tracking the user session. 

There is a backend database with three relations: (1) Users(uname, enc_ud) contains information 
about a user: the user name and the encrypted password; (2) Sessions(sid, sname) has two attributes: 
the paper session id and session name; and (3) Member(sid,unaine) describes the members of a pa- 
per session. To simplify the scenario, the data type of each column is varchar. For relation Member, 
there are two foreign key dependency constraints: (1) sid on Sessions, and (2) uname on Users. 
Throughout the paper, each relation or servlet is denoted using the first character of its name, e.g., 
Showsessions . php denoted by S and Users denoted by U. 

As an example. Listing [T] presents a fragment of Showsessions .php. The servlet consists of two 
parts: (1) a branch statement which examines a user's login status, by checking the value of session 
variable "uname"; and (2) a loop that reads the information retrieved by a SELECT statement to generate 
a list of paper sessions related to the user. 

Coverage Goal: line 45 in Listing [T] We are interested in two questions: (1) what is the database 
state or relational constraint that leads the execution of Showsessions .php to line 45? (2) what is 
the call sequence that generates the desired database state? Later, we will show that an algorithm can 
automatically synthesize a call sequence I GALS and the HTML parameters in each HTTP request. 

3 Path Transducer Model 

We intend to model each execution path of a servlet, as an atomic relational transducer. For example, 
one such execution path of Showsessions .php could be 1-2-3-4-6-7-8-12-13-15-16-44-45-44-90 (line 
numbers). We trace such an execution using a "path condition", i.e., the conjunction of branch conditions 
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l<?php 

2 include _once (" header . php" ) ; 

3 print "<div „ s ty 1 e =' f 1 o at : ri gh t '>" ; 

4 if ( ! isset ($.SESSION[ 'uname' ])) 

5 print "<a > Not^logged „in </aXbr>„" ; 

6 else 

7 print "Logged^in _as $_SESSION [' uname ' ] 

8 print "</divXbr>"; 

12 $result = query ( "SELECT^ s e s s i o n _i d „...„") ; 

13 if ( mysqI_num_rows ( $re sul t ) == 0) { 

14 print ( "Sorry ,„...") 

15 } 

16 $curday = —1; 

44 while ($row = my s ql .fetch .array ($ re s u It ) ) { 

45 $day = date("F„j", $row [ ' starttime ' ] ) ; 

90} 



Listing 1: Fragment of Showsessions.php 



along the execution path. Note that each servlet may have an infinite set of such transducers. During a 
static analysis, we can bound loop iterations and recursive call depth for achieving a finite set. 

Because each transducer only models one path, it is essentially a transition rule that is expressed 
as a Boolean combination of relational expressions. The first order relational logic HI is suitable for 
expressing uninterpreted functions and next state variables and relations. Here a primed variable (i.e., 
with a single quote) represents the value of the variable in the next state. Input parameters are preceded 
with a dollar sign, e.g., $m,$/7,$5' represent the input parameters uname, password, session name in a 
HTTP request. Session variables are denoted using #, e.g., #u represents the session variable uname. If 
in a transition rule, a session variable or relation does not have the primed form, then its value remains 
the same after the execution of the transition. Relations are generally represented as sets of tuples. 

Definition 3.1 A path transducer is a tuple (^,Dom, V,A) where & is the data schema (a finite set of 
relation schema), Dom is an infinite but countable domain for ^, V is a finite set of session variables 
(letting V' be the set of primed variables), and A is a Boolean combination of terms. Each term has one 
of the following forms: 

1. (equality) u = £'(v,Dom), where m G V U V' and E is an expression on the current state of variables 
and constants from Dom. 

2. (satisfiability check) SAT(*1') where *P is a relational algebra formula on ^. 

I 

Notice that in the equality form, a primed variable (next state) is allowed to appear on the LHS 
(left hand side) only. The syntax (expressiveness) of E is affected by the decision procedure used in 
analysis. Currently, we allow Presburger arithmetic fT9l and Simple Linear String Equation |13|. Unin- 
terpreted functions are allowed with first order relational logic on ^, but not with Presburger arithmetic. 
A relational constraint is defined similarly as a transducer. It is a Boolean combination of equality and 
satisfiability terms, except that no primed variable occurs. It is formally defined as below. 

Definition 3.2 Let £/ be the set of relational algebra formulae over ^. Let (pi,(f>2 € iz/ then all of 
the following also belong to £/: (1) (selection) Gi=x(pi where / € N and x G V UDom; (2) (projection) 
^(i,(2,...,(t*Pi where 11,12, it ^N; (3) (cross-product) 91 x (^2; (4) (union) (jOi U(p2; and (5) (difference) 
(pi - (p2. I 
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Figure 1 : Table of Notations 

We now list one sample path transducer for each of the five servlets. We first describe the function 
of a servlet and then formally define it as a transition rule. Figure [U summarizes the semantics of all 
notations to be used in the formulae. 

Showsessions.php: The servlet first checks the existence of a session variable "uname" (represented by 
#u), and then verifies if there is at least one session of which #u is a member. If the condition evaluates 
to true, the servlet executes a loop that displays the session name by retrieving information from table 
Sessions (S). Because the loop does not have any side effects (e.g., updating databases), the operations 
on S are not included in the path condition during symbolic execution. The following constraint is used 
to represent the transducer. 

#u ^ null A SAT(a2=#„M) (1) 

Here SAT tests if the input (a relational algebra formula) is satisfiable. For convenience, we omitted 
the formula #u' =#u A S' = S A M' = M A U' = U (i.e., maintaining the values of all variables and 
relations), but it needs to be encoded in implementation. 

Insertsessions.php: The servlet inserts a new session record into relation S. It accepts one input 
parameter $si (the session name). The primary key sid is automatically incremented by the system. 
The action is modeled using the standard set union operator. It is also an example of integer constraints 
involved in the application. 

5' = 5U{([5| + 1,$5/)} (2) 

Addmember.php: The servlet takes two parameters: user name ($ma) and session name ($sa)- Then it 
looks up for the corresponding session id, and then inserts a tuple into relation M (Membership). Here 
we assume that the prime key constraint is valid (every record in S has a distinct sid). So at any time, at 
most one new record is inserted into M. 

M'=M\jTly (02=$,, (5) ) X { ($MA ) } (3) 

Generaloptions.php: The transducer adds a user to the system. It takes two parameters: user name 
($mg) and password ($/?g)- / is an uninterpreted function that represents the encryption procedure. All 
uninterpreted functions are assumed to be deterministic. 

U' = UU {{%UgJ{%Pg))} a r{%pG) (4) 

In practice, there are security conditions that a legitimate user name and password must meet. This 
is represented using function r{%pQ) where r(x) is a Boolean function Z* — > {T,F}. For example, we 
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can define it as r{x) = x. contains A \x\ > 1. The solution of string constraints can be handled by 
string analysis (see e.g., II13I ). 

Login.php: The servlet takes two parameters: uname and password and then verifies their 

existence in the database. It updates the session variable #u with the value of %ui, so that the user session 
can now be traced by #m. The servlet assumes that #m is not set before the login activity. Also note that 
($ml,/($pl)) e [/ is a syntactic sugar for SAT(ai=$„^a2=/($pj (?/)). 

{%ulJ{%Pl)) eU A#u' = $ul A#u = null (5) 

3.1 Discussions 

We outline the idea of extracting path transducer models, though an implementation does not exist yet. 
Symbolic execution ||2TI is the major technical approach, and Halfond's recent work llT6l in interface 
extraction can be leveraged. A servlet is treated as a program, which takes global input (the GET and 
POST variables). These global variables are replaced by symbolic literals. When an assignment occurs, 
the variable on LHS (left hand side) is associated with a symbolic expression. Then, at a branch deci- 
sion, the jump to one of the branches is made randomly (so that it guarantees that each branch will be 
covered). When entering a branch, the corresponding condition is joined with the current path condition. 
Clearly, the path condition records the conjunction of all conditions that the initial input has to meet to 
reach the current execution location. Some known system calls (e.g., those to backend databases) are 
intercepted and replaced with the corresponding symbolic expression. Others (unhandled) are abstracted 
as uninterpreted functions. Eventually when the symbolic execution completes, the path transducer is the 
conjunction of thi^ee components: (1) the path condition, (2) assignments that change values of global 
session variables, and (3) operations that update the contents of database relations. String analysis plays 
an important role in deciding the precision of the translation from symbolic string expressions to rela- 
tional algebra formulae. Given a string expression (consisting of constant words and string variables) 
that represents a SQL query, populating the string variables with "benign" values generates a real SQL 
query which can be parsed to a SQL syntax tree and then be translated into relational algebra formulae 
(with variables reloaded). Note that here we ignore SQL injection attacks. 

4 Relational Constraint 

This section introduces the preliminary results about relational constraints. Inspired by the work on 
testing DBMS by Khurshid et al. ll20l . we translate a relational algebra formula to a relational logic 
formula and then use Alloy Analyzer ifTTl to find a model in a finite scope. Our initial experiments show 
that primary and foreign key constraints can be conveniently modeled. Within a small scope. Alloy can 
quickly find solutions that satisfy a relational constraint. 

Let V , Dom represent the data schema, set of session variables, and the data domain, respectively. 
Given a transducer T = (i^,Dom, V,A), we say a state of T is a database instance of ^ and a valuation 
of all variables in V . Given two states s and s' , we write s' G T{s) if s' is the result of applying all the 
assignments of T on s, i.e., T{s,s') evaluates to true. Note that there might be multiple states resulting 
from the same transition and input state. If s' is the only such post-state, we write s' = T{s). We now 
define the notion of symbolic image computation. 

Definition 4.1 Let T = (^,Dom, V,A) be a path transducer and / is a relational constraint over & and 
V . The preimage Pre(r,/) is defined as Pre(r,/) = {s \ s' ^ T{s) A s' G I}, and the postimage is 
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Post(r,/) = {s' I s' G T{s) A s £ I}. When / is true, we simply write Pre(r,/) and Post(r,/) as 
Pr e ( r ) and P o s t ( r ) , respectively. I 

The complexity of generating a relational constraint that represents Pre(r,5) is in polynomial time. 
This can be easily inferred from the fact that a transducer T is a collection of assignments on relations 
and variables. Simply replace all primed variable and relation by its RHS. Note that we do not have a 
similar result for post-image. We now proceed to a finite scope solution for the satisfiability problem. 
We illustrate the idea using an example. 



1 sig vchar {} 

2 sig UserRecord{ 

3 uname : vchar , 

4 pwd : vchar 

5 } 
6 

7 sig S es sionRecord { 

8 sid : Int , 

9 sname : vchar 

10 } 
11 

12 sig MemberRecord{ 

13 sid : Int , 

14 uname : vchar 

15 } 
16 

17 sig UserTable{ 

18 list: set UserRecord 

19 }{ 

20 //primary key 

21 all x,y: list 

22 X . uname = y . uname => x=y 

23 } 
24 

25 sig SessionTable{ 

26 list: set SessionRecord 

27 }{ 

28 //primary key 

29 all x,y: list 

30 x.sid = y.sid => x=y 

31 } 



Figure 2: Sample ALLOY Specification 

Example 4.2 Assume that we are interested in performing the following query: list all users who are 
members of session "si" but not of session "s2". This query can be expressed using the following 
relational algebra formula: 

TtA{02='sV(y\=3{S X M)) - TlA{(y2='s2'(y\=?,{S X M)) 

The Alloy specification of the query is shown in Figure [2] The first part (lines 1 to 40) defines the 
data schema: relations and the corresponding primary key constraint for each relation. Each row of a 
relation is defined as a sig (data type) in Alloy, and each column is defined as an attribute. A relation 
is defined as a set of the corresponding rows (records). Constraints (such as primary and foreign keys) 
can be defined conveniently as first order relational logic formula. The aboutRecords fact asks Alloy 
to perform search on those records in a relation only. 
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fact aboutRecords { 
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// The following are for query 






48 


one sig c.sl , c_s2 extends vchar 


{} 




49 


pred parti [d:vchar]{ 






50 


some a,c:Int | some b: vchar | 
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a=c && b in c.sl && 
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: y 


list 1 


53 


x.sid = a && x.sname=b) 






54 


&& (some y: MemberTable some 


X : 


y.list 1 


55 
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61 


} 







X.Fu 



45 



The second part of the Alloy specification encodes the query. It essentially follows the idea of 
converting a relational algebra to a first order logic query. The predicate query encodes the desired 
difference operation. By running the query for exactly 1 instance of each data relation and 3 instances of 
data records for each relation, a database instance is generated in 78ms to satisfy the query, on a laptop 
PC with 4GM memory and 3GHz CPU. It is shown as below: 

1. Session! able : {(vchar3 ,C-sl)} 

2. MemberTable : { (.vchar3 , vcharl) } 

3. UserTable: {(vcharl ,vchar3)} 

Here all foreign key constraints are followed, e.g., the sid column (vcharS) of the only record of 
MemberTable is the same as the one in SessionTable. cjl denotes the constant si, and it is the value 
of the sname of the SessionRecord. For simplicity, the uninterpreted function / (encryption) is not 
encoded in this example. It is available in the Alloy model for Step 5 in §5. I 

5 Call Sequence Synthesis 

This section introduces the synthesis algorithm that generates a test case composed of a sequence of 
HTTP requests. The purpose is to reach a certain database state for achieving coverage goals. We begin 
with some formal definitions needed for discussions later. 

Definition 5.1 An HTTP request r is a tuple {u,p) where u is the base URL, and p is a. finite set of 
parameters. Each parameter is a tuple («,v) where n is the name of the parameter and v is its value. p[n\ 
denotes the value of parameter n. I 

Definition 5.2 A call sequence is a finite sequence of HTTP requests ri , . . . , r„ . We call the corresponding 
HTTP responses si,...,Sn. Each Si is a string that represents the contents of the HTML file returned. I 

A call sequence is also written as {r\,s\),{r2,S2), {rn,Sn) when HTTP responses need to be considered. 
We call a response bad, if it contains error messages such as HTTP 505 internal error. 

5.1 Call Sequence Synthesis Algorithm 

Figure |3]presents a heuristic algorithm that synthesizes a call sequence of web servlets. Its input includes 
(1) initial database states specified using a relational constraint So, (2) a finite set of transducers and 
(3) the objective states represented using relational constraint (p. 

As shown in Figure [3l the backtracking algorithm attempts to simulate the changes of system states, 
using pre-image computations. Here 5 is a stack which stores the call sequence. Each element of 5 is a 
tuple which records the current servlet being attempted and the corresponding system states (represented 
using a relational constraint). Every iteration of the backtracking loop tries to identify anew servlet which 
modifies the system state, geared towards the direction of initial states So- The current system states 
(represented using cp) are compared with So using function getModif ied. It tries to extract the set of 
variables and relations whose "value domain" change between the two sets of symbolic states. The value 
domain of a variable v in constraint <pi is formally defined using formula cpi (v) = {a | 5 G <pi A ^[v] = a}. 
Here ^[v] is the value of variable v in state s. Clearly (pi (v) can be computed using existential elimination 
of all other variables/relations in the formula of <pi . If a finite scope is given, this can be achieved using 
first order relational logic in Alloy ifTTl . Once the "difference" of the two symbolic constraints (jPi and 92 
is found, each servlet is statically examined. A servlet that modifies some variables/relations in Mi — M2 
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is identified and pushed to stack S. If none is found, the search process backtracks. It proceeds until the 
initial state constraint Sq has some intersection with the current system state; or the algorithm fails and 
returns an empty stack. 

We use a case study to illustrate the effectiveness of the algorithm. Applying the algorithm to real- 
world examples remains as our future work. The heuristics works better with INSERT statements than 
UPDATE statements. 

1 //So: desired initial states, cp: desired path condition, 3^: a finite set of path transducers 

2 Procedure CallSeqGen(.^, So, tp) 

3 LetS=[(_L,<p)] 

4 //S is a stack which stores the intended call sequence to return 

5 //Each element of S is a tuple of (action, current state) 

6 while(So n (f>=0 and |5| > 1){ 

7 Let cpi and (f>2 be the states stored in the top 2 elements of S. If |5| = 1, let (p2 be true 

8 Let Ml = getModified((pi ,Sq) 

9 Let M2 = getModified((p2 ,5o) 

10 Find a path transducer s e .T which modifies some target v EM1—M2 and Pre{s, (p) ^ false 

1 1 If no servlet is available then backtrack: 

12 remove the first element of S; continue 

13 <p :=Pre(i,(j!));5:= (i,(p)oS 

14 } 

15 return S 

16 //return a set of session variables and relations that are changed between the two sets of states represented by 91 and <^ 

17 Procedure getModifled((pi,<j!>2) 

18 Let R = {} // R is the set of variables and relations to return 

19 foreach v € V U ^: //v could be a session variable or relation 

20 Let 91 (v) = {a I i 6 (pi A s\v\ = a} //here i is a state belong to 91 and s[v] is the value of v in ^ 

21 Let (p2(v) = {a I i e (j!>2 A i[v] = a} 

22 if (pi(v)-(?>2(v) /0or(j!>2(v)-(j!)i(v) 7^0: R = R + {v} 

23 returns 

Figure 3: Call Sequence Generation Algorithm 

5.2 Case Study 

The coverage goal is to reach line 45 in Listing 1. We would like to synthesize a call sequence that 
reaches the line. The initial state is expressed using formula #m = null A S = M — U =0, i.e., the 
database is empty and none of the global session variables is set. The target constraint cq is represented 
by relational constraint #u ^ null A SAT(a2=#MM), i.e., table M has one record whose second attribute 
"imaiiie" has the value #u (the user name contained in session variable #uname). This is essentially the 
pre-condition of a path transducer of Showsessions . php that reaches Une 45. 

Step 1: We start with the path transducer of Showsessions .php. The first step is to compute the 
pre-image of the transducer. 

Pre(#M / null A SAT(02=#«M) A S' = S A U' = U A M' = M A #u' = #M,true) (6) 

Recall that Pre takes two parameters: (1) the transition relation, and (2) the post image. The algorithm 
of Pre is straightforward. First, convert all "current state" variables in the post condition (i.e., the second 
parameter of Pre) to "next state", construct the conjunction of the two parameters, and then replace each 
post state variable (e.g. ,"5'") with its RHS (e.g., "5") in the assignment of transition relation. We then 
obtain the precondition. Notice that this algorithm is based on the assumption that in each assignment, 
post-state variables occur in LHS only. Clearly, for step 1, the resulting pre-image is shown below: 
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N4:#ujt null A SAT(a2=#»M) 

The synthesis algorithm cannot terminate here, because the goal n A'4 7^ is not met. We need to 
determine which is the action before Showsessions .php. The candidates are Login. php which resets 
#u and Addmember . php which updates M. Without loss of generality, we choose Login . php as the 
preceding action of Showsessions .php. 

Step 2: The current call sequence is LS (Login . php and then Showsessions . php). The post condition 
of L is A'4. We can compute its pre-image as follows. 

Pre( eU A#u = null A #u' = Sul A U' = U A S' = S A M' = M A 

#u' / null A SAT(<72=#„'M')) 

Here the first row is the transition rule for Login. php, and the second row is the primed form of A'4. 
We use the one parameter version of the Pre function here, and the post-image has to be primed. The 
same algorithm results in the following pre-image. Note that here $ul and $pl are the input parameters 
(uname and pwd) of Login . php. 

N3 : ($ml,/($Pl)) eU A#u = null A $ul 7^ null A SAT(c72=$„iM) 

Step 3: Next, Addmember . php is identified by the heuristic algorithm as the preceding action. The 

pre-image is computed as below. As usual, the first row is the transition rule and the second row is the 
primed form of N3. Notice that $ul is the input parameter for Login. php. It is treated as a symbolic 
literal (its value cannot change during the execution). We do not have to "prime" it in the post-image. 

Pre( M' =MU{7ti{a2=$,^{S))x{{$UA)}) A U' = U A S' = 5 A #1/ = #u A 
($ml,/($pl)) eU' a #m' = null A $ML T^null A SAT(02=$aiM')) 

Here $ua and $sa are the input parameters (uname and session name) of the Addmember .php. The 
pre-image computation results in the following state constraint: 

SAT(CT2=$„i(M U (;Ii(CT2=$.^(5')))x{($ha)})) A {UL,f{SpL))€U A #« = null A Sml^^uuH 

Step 4: The pre-image computation is listed as below for Generaloptions .php (to add a user). Here 
$ug and Spc are the input parameters (uname and pwd). 

Pre( U' = UU{{$UG,f{$PG))} A r($pG) A M' = M A S' = S A #u' = #u A 

SAT(<72=$„i(M' U (jri(CT2=$s^(S')))x{($«A)})) A ($«L,/($pz.)) e [/' A #«'=null A $«l ^ null) 

It results in the following: 

Ni-.riSpa) A SAT(C72=$„JM U (02=$.^ («))) x {($«a)})) a (S«L,/($pL))e[/U{($»G,/($PG))} A #M = null A $«L^null 
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Since n A^i = 0, The heuristic algorithm has to proceed to modify table S. 
Step 5: Using constraint A^i as the post condition of Insertsession . php, we have: 

Pre( S' = 5U{(|S| + l,$s;)} A M' = M A U' = U A #u' = #u A 

r{$pG) A SAT(c72=$„i(M' U (?ri(c72=$i^(S')))x{($MA)})) A ($«L,/($pz.))e;7'u{($MG,/{$PG))} A #m' = iiu11 A $»l/iiu11) 

This leads to the final constraint A^o '■ 

No:r{SpG) A SAT(o-2=$„, (MU (tTi {(J2=$i^ {SU{{|S| + l,Ss,)}))) x {Sua})) A {UlJ{Spl)) e (7U {($ho./{$Po))} A #« = iiull A Sia^null 

If we test So n A'o, the following is an assignment which provides satisfiability: 

M = S = U = d A $ul = $ua = Sua = a A Spa =SpL = h A Ssi = Ssa=c A#u = null 

Here a,b,c are three constants generated by the model finder. The constraint r{b) can be discharged 
separately using a string constraint solver like ll27l and |[T3l . When the string constraint r{b) is ignored, 
Alloy Analyzer is able to generate the model in 1.07 seconds and solve the model in in 57ms. In the 
model generated by Alloy, constants a is equal to c, and the encryption function is properly modeled in 
Alloy and it has the property: \/a,b : f{a) = f{b) <^a = b. 

6 Detecting Workflow Attack 

This section briefly discusses the extension of the algorithm for detecting workflow attacks 13]. Since 
web servlets are openly accessible to end users, hackers could potentially access the web application 
and violate its intended "workflow" logic (e.g., shipping a product before payment is handled). Such 
an attack can cause great financial losses. To model the "intended" workflow of a web application, we 
could apply string analysis like |6| and collects the URLs that are contained in the HTML generated by 
a servlet. Then, a workflow attack can be defined formally. 

Definition 6.1 An enhanced path transducer is a tuple T = (^,Doin,y,A,L,?7) where Dom, V , A are 
as defined in Definition 13.11 L is a set of transducers that T can navigate to. U is the base URL where 
the corresponding servlet is deployed. I 

Given a collection of path transducers, we use L{U) to denote the union of the L components of all 
path transducers who have U as the base URL. 

Definition 6.2 Given a web application that consists of ^, a finite set of path transducers, a workflow 
attack is a call sequence (ri,5i),...,(r„,5„) where none of the responses is bad and there exists / G [1,« — 
1] s.t. r,+i [0] L(r, [0]). Here r[0] to refer to the first element, i.e., the request URL, of a request r. I 

The CallSeqGen algorithm in Figure [3] can be slightly modified to discover workflow attacks. The 
inputs to the algorithm are: (1) ^ , the collection of enhanced path transducers; (2) So : Vv € V v = null, 
the initial state where all session variables are null (database is not necessary to be empty); and (3) 
true as the desired post condition. Then at line 10 of Figure|3l append an additional condition shown as 
below when selecting s: 

\S\=2^s' ^L{s) 

where s is the path transducer to be selected, s' is the second top element currently in the stack, and L{s) 
is the L component of s. Enforcing |S| = 2 is to find the shortest attack string, such that the violation 
action is the last one in the sequence. Note that due to the backtracking nature of the algorithm, if the 
current guess of s does not work, the algorithm will trace back to find another candidate. 
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7 Related Work and Conclusion 

This work is closely related to testing database applications, e.g., code coverage and database unit testing 
ll9j[T2l|5l. Recently, query aware database generators are reported to significantly improve the size and 
quahty of test cases (see ||20l Our work goes one more step beyond database generation - we 
synthesize the call sequence of web application servlets. 

The general satisfiability problem for relational database is undecidable fTT]. In practice, one has to 
adopt either approximation or solving decidable fragments. Emmi, Majumdar, and Sen generate database 
input for programs in [12], and a decidable fragment of SQL is handled, which does not allow join and 
negation. In [4], a reverse query processing technique is developed, which introduces approximation 
when handhng negation. In |20|, Khalek et al. translate SQL queries to relational logic formula, and use 
Alloy Analyzer ifTTl to perform model finding. We adopt a similar approach to that of Khalek's. 

Extracting the interface of a web servlet has been investigated by Halfond and Orso lfT4l[T5ll . where 
static analysis is used to identify the parameters accepted by a web application. In [T6l, they went a 
step further to collect path conditions as web servlet interfaces. In this paper, the interface extraction 
concentrates on the manipulation of backend databases. Each web application servlet is modeled as a set 
of path transducers, which is inspired by the relational transducer model introduced by Abiteboul et 
al. for modeling electronic commerce. Automated verification of relational transducers is discussed in 
ll23llT0l . The problem is in general undecidable. 

Testing web applications has its unique challenges. Due to the existence of server states, e.g., session 
variables and HTTP cookies, a test case of web applications usually consists of a sequence of HTTP 
requests. This is often called session based testing 1221 [TTl l24l . In the aforementioned work, HTTP 
sessions are either manually created or collected by parsing Apache server log. The technique presented 
in this paper synthesizes test cases automatically. It has the potential to improve code coverage. 

We have outlined a framework for automatically generating test cases of web applications. The key 
idea is to first extract the interface of each web servlet as a set of single path relational transducers. 
Then we could solve symbolic constraints on relational databases and synthesize a call sequence of web 
servlets. Our future work includes the implementation of the proposed technique and investigation of 
more efficient constraint solving techniques. 
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